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Statistical inference for discrete time observations of an affine stochastic delay differential equa- 
tion is considered. The main focus is on maximum pseudo-likelihood estimators, which are easy 
to calculate in practice. A more general class of prediction-based estimating functions is investi- 
gated as well. In particular, the optimal prediction-based estimating function and the asymptotic 
properties of the estimators are derived. The maximum pseudo-likelihood estimator is a partic- 
ular case, and an expression is found for the efficiency loss when using the maximum pseudo- 
likelihood estimator, rather than the computationally more involved optimal prediction-based 
estimator. The distribution of the pseudo-likelihood estimator is investigated in a simulation 
study. Two examples of affine stochastic delay equation are considered in detail. 
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1. Introduction 

In the last decade, statistical inference for stochastic delay differential equations (SDDEs) 
has been studied from various viewpoints. Early work on maximum likelihood estimation 
was done by Kiichler and Mensch [15]. Gushchin and Kiichler [8] and Kiichler and Kutoy- 
ants [14] determined the non-standard asymptotic properties of the maximum likelihood 
estimator for SDDEs, and Kiichler and Vasil'jev [20] constructed sequential procedures 
with a given accuracy in the L2 sense. Nonparametric estimators for affine SDDEs were 
investigated by Rcifi [22] and Rcifi [23] . All these studies were concerned with continuous 
observation of the solution process. 
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As opposed to the situation for ordinary stochastic differential equations, observations 
at discrete time points have been little studied for SDDEs. Rcifi [21] studied nonparamct- 
ric estimation. Kiichler and S0rensen [19] proposed a simple estimator for the parameters 
otk in the particular type of SDDE given by (2) below. This estimator is biased, however, 
and can only be expected to work well for high-frequency observations. In this paper 
we report a first attempt at investigating parametric inference for affinc stochastic de- 
lay equations of the general type (1) observed at discrete time points. We propose a 
pseudo-likelihood function and study it in the framework of prediction-based estimating 
functions. Applying the methods proposed here in practice often requires the ability to 
simulate solutions of SDDEs. This problem has been studied by, among other, Kiichler 
and Platen [16] and Buckwar [2]. A practical application of one of the simplest SDDEs, 
discussed in Example 2.1 below, was provided by Kiichler and Platen [17]. 

We consider the model given by the stochastic differential equation 



where a a is a measure on [— r, 0] (0 < r < oo) such that (1) has a unique stationary 
solution (for a suitable given initial condition). Conditions under which (1) has a unique 
stationary solution were given by Gushchin and Kiichler [9]. By Theorem 3.1 in this 
paper, the stationary solution is a Gaussian process. We assume that the measure a a 
depends on a parameter a. The parameter about which inference is to be drawn is (a, a) 
or (a, a, r), (a,r > 0). As usual, we denote the parameter space by 9C MP. The process 
W is a Wiener process. The initial condition is that the distribution of {X(s) | s G [— r, 0]} 
is the stationary distribution, which always has expectation 0. The data are observations 
at discrete time points: X(A),X(2A), . . .,X(nA). 
An interesting particular case of (1) is 



Here the measure a a is concentrated in the discrete points — r%, . . . , — rzv, (j-j > 0). The 
vector (ri, . . . , rjv) may be among the parameters to be estimated. The particular case 
where N = 2 and r\ = is considered in detail in Example 2.1. 

In Section 2 we discuss how to calculate the likelihood function for discrete time ob- 
servations, and propose a pseudo-likelihood function that closely approximates the likeli- 
hood function and is considerably easier to calculate. We consider two examples in detail. 
In Section 3 we present prediction-based estimating functions for affine stochastic delay 
equations, find the optimal estimating function in this class, and show that the pseudo- 
likelihood estimator is a particular case of a prediction-based estimator. The prediction- 
based estimating functions provide a good framework for discussing the asymptotics of 
the pseudo-likelihood estimator and in particular the efficiency loss compared with the 
optimal prediction-based estimating function. We do this in Section 4, specifying condi- 
tions ensuring consistency and asymptotic normality. Finally, in Section 5 we present a 
simulation study of properties of the pseudo-likelihood estimator. 




(1) 



N 




(2) 



k=l 
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2. The likelihood and the pseudo-likelihood function 

Because the stationary solution to (1) is a zero-mean Gaussian process, Gushchin and 
Kiichler [9], the data are in fact a Gaussian time series with expectation 0. Therefore, in 
principle the likelihood function can be calculated if we can determine, analytically or 
numerically, the autocovariances 

Ke(t) = E e (X(0)X(t)), t>0. (3) 

The autocovariance function, Kg(t), satisfies the differential equation 

d t K e {t) = J K e (t + s)a e (ds), t>0, (4) 

with dtKg(0+) = — i<7 2 , provided that we define Kg(—t) = Kg(t) for t > (see Gushchin 
and Kiichler [10]). The condition d t K e (0+) = -\o 2 also can be written in the form 



2 / K g (s)a e (ds) = -a 



2 



Equation (4) is a continuous-time analogue of the Yule-Walker equation known from 
time-series analysis, and hereinafter we refer to (4) as the delay Yule- Walker equation of 
(3). In general, this equation must be solved numerically, but we consider two particular 
examples where it can be solved explicitly. 

To calculate the likelihood function, define for every t = 1, . . . , n the ^-dimensional vec- 
tor n e (0) = (Kg (A),. . . , K (£A)) T , and the £ x ^-matrix K t (6) = {K e ((i- 3)A)}i,j=i,...,t- 
Here and later T denotes transposition of vectors and matrices. The matrix JCg(9) is the 
covariance matrix of the vector of the first £ observations X(A), . . . ,X(£A). 

The conditional distribution of the observation X((i + 1)A) given the previous obser- 
vations X(A), . . . , X(iA) is the Gaussian distribution with expectation 4>i(9) J ' Xi-\ and 
variance Vi(&), where 4>i(9) is the i-dimensional vector given by <fii(0) = ICi(9)~ 1 Ki(6) , 
Vi{6) =Kg(0) - KiiefKiiO)- 1 ^), and X l:J = (Jf(iA), . . . , X(j A)) T , i > j > 1. The 
vector <pi(9) = (<pi^(6), . . . ,(j>i.i(9)) T and the conditional variance Vi(9) can be found us- 
ing the Durbin-Lcvinson algorithm (see, e.g., page 169 in Brockwcll and Davis [1]). 
Specifically, <£i,i(0) = Kg(A)/Kg(0) and v (6)=K e (0), whereas 





and 



v i (e) = v i - 1 (9)(l-<t> i , i (9) 2 ). 
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The likelihood function based on the data X(A), . . . ,X(nA) is 



L n (0) 



1 



y/2nvo(0) 

n-l 



exp 



i=l 



2v a (8) 



exp 



X(A)'- 



2vi{6) 



(5) 



(X((i + l)A)-^(0) T X i:1 )' 



Calculation of this function quickly becomes very time-consuming as the sample size n 
increases. In particular, <f>i(9) and Vi{&) must be calculated for every observation time 
point. However, the autocovariances Kg(iA) decrease exponentially with i (sec Dick- 
mann et al. [3], page 34). Using the Durbin-Levinson algorithm, it is readily apparent 
that this implies that the quantities <t>i,j(Q) decrease exponentially with j. Thus the con- 
ditional distribution of X((i + 1)A) given X(A), . . . ,X(iA) depends only very little on 
observations in the distant past. 

Therefore, we propose using instead a pseudo-likelihood function obtained by replacing 
in the likelihood function the conditional density of X((i + 1)A) given X(A), . . . ,X(iA) 
with the conditional density of X((i + 1)A) given X((i + 1 — fc)A), . . . ,X(iA), where 
k typically is relatively small. This pseudo-likelihood function was proposed by H. 
S0rcnsen [24] in connection with stochastic volatility models, but the idea is widely 
applicable. The pseudo-likelihood is given by 



L n (B) = I 



y/2mi k {e) 



exp 



2v k {6) 



{X{{i + l)A)- ( j )k {e) T X l .. l+1 _ k )' 



(6) 



We have not included the density of X k .\. Note that the computational gain is large 
because we calculate (6) using the same values of 4> k (6) and v k (9) for all observation time 
points. Thus, these quantities must be calculated only once for every value of L n {9). We 
call the number k the depth of the pseudo-likelihood function. We consider the influence 
of k on the quality of the estimators in the simulation study reported in Section 5. As 
would be expected, the quality increases with increasing depth. For the model considered 
in Section 5, the present study indicates that the bias and the variance of the estimators 
do not depend much on the depth when k is larger than 3-5 times r. 



Example 2. 1 . Consider the equation 



dX(t) = [aX(t) + bX(t-r)]dt + adW(t), (7) 

where r > 0, a > 0. This is a particular case of the model (2). The real parameters a and 
b are chosen such that a stationary solution of (7) exists. This is the case exactly when 
a < r^ 1 and — a/ cos(£(ar)) < b < —a if a 7^ 0, and when — 7t/2 < br < if a = 0. Here 
the function £(it) G (0,7t) is the root of = utan(£(u)) if u ^ 0, and £(0) = n/2. The 
stationary solution is unique if it exists. Details of this have been provided by Kiichler 
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and Mensch [15], who explicitly found the covariance function of the stationary solution 
by solving the Yule- Walker delay differential equation (4), 



They found that 



K e (0) = < 



d t K e {t) = aK e {t) + bK e (t - r), 

' CT 2 (6sinh(A(a,6)r) - A(a,fc>)) 
2A(a, b) [a + 6cosh(A(a, b)r)] 
a 2 (br-l)/(4b) 
a 2 (bsm(X(a,b)r) - X(a,b)) 
t 2X(a,b)[a + bcos(X(a,b)r)} 



£>0. 

when |6| < —a, 
when b = a, 
when b < — lal, 



(8) 



where A(a, b) = \J[a 2 — b 2 [, and that for t £ [0,r] the covariance function is 

K g (Q)cosh(X(a,b)t) ~ a 2 (2X(a,b))~ 1 sinh(X(a,b)t) when \b\ < -a, 

when b = a, 
when b < —lal. 



K 8 (t)={ K e (0)-±ta 2 



(9) 



K g (0) cos(A(a, b)t) - a 2 (2X(a, sin(A(a, b)t) 



Because Kg(t) is known in [0,r], the Yule- Walker equation becomes an ordinary differ- 
ential equation for Kg(t) in [r, 2r], which can be easily solved. Similarly, for t>r, the 
autocovariance function Kg (t) is given by 



Kg(t) 



' a{t - s) Kg(s~r)ds + c a ^- nr) Kg(nr), t G [nr, (n + l)r],n€ N. (10) 



Thus Kg{t) can be determined iteratively in each of the intervals £ £ [nr, (n + l)r], iieN. 
Note that the covariance function depends on a and r in a simple and smooth way, 
so that these parameters also can be estimated by maximizing the pseudo-likelihood 
function (6). 

For 6 = 0, the model (7) is the Ornstein-Uhlcnbeck process, for which (8) and (9) 
simplifies to the well-known result Kg{t) = —(a 2 /(2a))e at (£ > 0) in the stationary case 
a < 0. For a — 0, we obtain the model 



dX(t) = bX(t - r)dt + adW(t). 



(11) 



This process is stationary if and only if br 6 (— 7t/2,0), and in this case, by (8) and (9), 
the autocovariance function is given by 



Kg(t) 



a 2 ( 1 — sin(6r) 



2b \ cos(br) 



cos(6£) + sin(6£) 



(12) 



when t e [0,r]. By (10), we find that 



Kg(t) = --^[2 + cos(&£){(tan(&£) - tan(6r))(l - 2sin(6r)) - l/cos(6r)}] (13) 



for £ £ [r, 2r] . 
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Example 2.2. Consider the equation 

AX{t) = -b(^J X(t + s)e as ds^j dt + adW(t), (14) 

where r > 0, a > 0. The set of values of the parameters a and b for which a unique sta- 
tionary solution of (14) exists was studied by Reifi [21]. This set is rather complicated and 
irregular; for instance, it is not convex. However, it contains the region {(a, b) \ a > 0, b > 
0,6(1 + e _ar ) < max(7t 2 /r 2 , a 2 (e ar — l) 2 )}. For a = 0, corresponding to a uniform de- 
lay measure, a stationary solution exists exactly when < b < ^n 2 jr 2 . When r = oo, the 
situation is much simpler. In that case, a stationary solution exists for all a > and b > 0. 
When a = (and r is finite), 



a^2 b (l/2-t)) Q<t<r 
2rV26cos(r v /672) 2br 



For a > 0, an explicit expression for Kg(t) involving trigonometric functions exists as 
well (see Reifi [21], page 41), but it is somewhat complicated, and thus we omit it here. 

3. Prediction-based estimating functions 

In this section, we discuss the pseudo-likelihood estimator in the framework of prediction- 
based estimating functions. This class of estimating functions was introduced by 
S0rensen [25] as a generalization of the martingale estimating functions that is also 
applicable to non-Markovian processes such as solutions to stochastic delay differential 
equations. Applications of the methodology to observations of integrated diffusion pro- 
cesses and sums of diffusions have been described by Ditlevsen and S0rensen [4] and 
Forman and S0rensen [6]. An up-to-date review of the theory of prediction-based esti- 
mating functions has been provided by S0rensen [26]. 

We show that the pseudo-likelihood estimator is a prediction-based estimator, and find 
the optimal prediction-based estimating function, which turns out to be different from 
the pseudo-score function. Optimality is in the sense of Godambe and Heyde [7] (see 
Heyde [11]). We impose the following condition that is satisfied for the models considered 
in Examples 2.1 and 2.2. 

Condition 3.1. The function Kg(t) is continuously differentiable with respect to 9. 
Under this assumption, we find the following expression for the pseudo-score function: 



d e £ n (9) :=d e \og(L n (9)) 

'-(X((i + 1)A) - 4> k {9f Xi+i-fc) (15) 



- 1 d e M&) T x t -,+i- k 



E 



v k {9) 



£[(*((» + 1)A) - MOf^+i-kf - «*(*)]. 
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The derivatives de4>k(0) and dgVk(8) exist when Kg(t) is differentiable and can be found 
by the following algorithm, which is obtained by differentiating the Durbin-Levinson 
algorithm: 



d e ct>i,i{6) = 



i-l 



d 6 K e {iA) - J2(9e^i-i),MKe((i - j)A) 

^ 3=1 

+ 4>(i-i)A e ) d eKe((i - i)A)) ) v^O) 



\d ej <t>i,i-i{e) . 



K e (iA) -Y,<t>(i-i)j(fi)Ko((i-3)*) )d e vi-i(e) 

I 



vi-i(ey 



de^i-iAO) 




/ d 9j 4>i-\,i-\ (9) ' 

-4>iM \ 

\ d 6j <t>i-i,x{0) 

for j = 1, . . . ,p, and 

dev t (6) = dgv i - 1 {6)(l - M (0) 2 ) - 2« i _ 1 (0)0 M (0)fy0 M (0). 

The minimum mean squared error linear predictors of + 1)A) and (^((* + 1)A) — 
fe (6')' r Xj :i+ i_ fc ) 2 given X i:l+ i_ k are 4> k (0) T X vA+ i- k and Ufc(0), respectively. This is be- 
cause for the Gaussian processes considered in this paper, the two conditional expecta- 
tions are linear in Xi A+ i- k . Thus the pseudo-score function is a prediction-based esti- 
mating function as defined in S0rensen [26] , where estimating functions of a slightly more 
general type than in the original paper (S0rensen [25]) are treated. The generalization 
allows the predicted function to depend both on the parameter and on the previous ob- 
servations. Exploring the relation of the pseudo-score function to the optimal estimating 
function based on these predictors is of interest. 

We start by defining a class of prediction-based estimating functions. Define the (k + 
1) x 2-matrices 



Z (i) 



0---0 

and the (k + l)-dimensional vectors 

X((i + l)A)-<fe(0) T JW_ fc 



Hi(0) = Z® 



(x((i + i)A) - <t> k (9) T x i:i+1 _ k y - v k (6) 



,71 — 1. 
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Then the class of prediction-based estimating functions to which (15) belongs is given by 

71-1 

G n (0)^A(9)J2H l (e) : (16) 

i—k 

where A(9) is a p x (k + 1) matrix of weights that can depend on the parameter, but 
not on the data. The pseudo-score function (15) is obtained if the weight matrix A(0) is 
chosen as 

deMOV d e v k (6) \ 
v k (9) 2v k (9) 2 )' 

Within the class of estimators obtained by solving the estimating equation G n (9) = for 
some choice of A(9), the estimator with the smallest asymptotic variance is obtained by 
choosing the optimal weight matrix A* (9). The optimal estimating function is the one 
closest to the true score function in an L 2 -sense (for details, see Heyde [11]). 

We now can find the optimal weight matrix A* (9). The covariance matrix of the 
(k + 1) -dimensional random vector Y^ii=k (8) / y/n — k is 

M n {6) = M^{6) + M^{e), (17) 

where 

n— k — 1 / , .\ 

M n 2) W= E ( ~J [Ee(H k (6)H k+j (Gf)+Ee(H k+j (6)H k (6) T ] 
j=i \ n ' 

and 

M^\9)=E e {HmHm T )=( Vk{0 £f ) 2 ^ )2 ), 

with Oj 1 _j 2 denoting here and later the j\ x j 2 -matrix f Os, and with K, k {9) denoting the 
covariance matrix of (X(kA), . . . ,X(A)) defined in Section 2. We have used Eg(Hi{9)) = 
0, which is a general property of prediction-based estimating functions (see S0rensen [26]). 
In this particular case, this is easily seen by finding the conditional expectation of Hi (9) 
given which is 0. 

To find the optimal estimating function, we also need the px (fc + 1) sensitivity-matrix 
S(9), given by 

s{9r=E e { deT Hm=-{^ h fXte) 6) )- (18) 

For the derivation of M^{9) and S(9), we use that the model is Gaussian and, in 
particular, we use that (f> k (9) T Xi : i + i- k is the conditional expectation of X((i + 1)A) 
and not just the minimum mean squared linear predictor as in the general theory of 
prediction-based estimating functions. The optimal weight matrix is given by 




see S0rensen [26]. 



A* n (6) = -S(6)M n (6)-\ 



(19) 
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The class of estimating functions considered above is not the full class of prediction- 
based estimating functions to which (15) belongs, as defined by S0rensen [25] and 
S0rcnsen [26]. The full class is obtained by replacing in (16) A(9) with a p x 2(k + 1) 
matrix and Hi{6) with the 2{k + l)-dimensional vectors Hi{9) obtained when is 
replaced by the 2(fc + 1) x 2 matrix, 

m-f^x-H 1 () ■'■ 
V o 1 , k i o xg +1 _ k 

in the definition of Hi(9). In this way, Hi(9) is extended by k + 1 extra coordinates. 
Because the moments of an odd order of a centered multivariate Gaussian distribution 
equal 0, we see that the extra k + 1 coordinates of Hi{9) have expectation under the 
true probability measure irrespective of the value of the parameter 9; therefore, they 
cannot be expected to be a useful addition to Hi{9). However, the extra coordinates 
might be correlated with the coordinates of Hi(9), and thus might be used to reduce 
the variance of the estimating function. To see that this is not the case, the optimal 
estimating function based on Hi(9) can be calculated. The covariancc matrix of the 
random vector Y^"=k H^ l \9) / yfri^k can be shown to be a block-diagonal matrix with 
two (k + 1) x (k+ l)-blocks, the first of which equals M n {9). This follows from the fact 
that moments of an odd order of a centered multivariate Gaussian distribution equal 0. 
Moreover, the sensitivity matrix corresponding to Hi(9) is 

(lC k {9)d e T^ k {9y 

S{9) T = Ee{d eT H t {9)) = - d gT v k {9) 

V o k +i, P 

Therefore, the optimal weight matrix is 

A* n {9) = {A* n {9) O Plfc+ 0, 

and thus the optimal prediction-based estimating function obtained from Hi(9) equals 
the optimal estimating function obtained from Hi(9). It is therefore sufficient to consider 
the aforementioned smaller class of prediction-based estimating functions, which we do 
in the rest of the paper. 

The pseudo-score function, dg£ n (9), is not equal to the optimal prediction-based esti- 
mating function. In fact, 

A{6) = -S{6)M ( > 1) {6)- X . (20) 

The magnitude of the difference between the two estimating functions depends on how 
small the entries of Mn (9) are relative to the entries of M^\9). Because correlations 
decrease exponentially with the distance in time (see Reifi [21], page 26), the terms in 
the sum defining m£ ] (9) can be small compared with the entries of M^(9); however, 
under what conditions this occurs and exactly how small the terms are depend on 9, A, 
and k. 
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In the next section we show that the limit 

oo 

M^(9) = lim Mg\6) = Y\E {H k {6)H k+] {O) T ) + E e {H k+j {9)H k {9) T )] (21) 

n— yoo * — * 

exists. Therefore, we can define the following weight matrix, which does not depend on n: 

A*{6) = -S(6)M(9)- 1 , (22) 

where 

M{6) = lim M„(0) = M {1) (9) + M {2) {9). (23) 

n— >-oo 

The estimating function 

6£(0) = A* (24) 

i=fe 

is asymptotically optimal and theoretically is easier to handle than A* n (9) J2i= k .Hi{6). 
In practice, the optimal weight matrices A^(9) or A* (9) usually must be calculated by 
simulation. The amount of computation can be reduced by using the approximation to 
G* n {9) obtained by replacing A* (9) or A* n (9) with the matrix obtained from (22) and (23) 
when M^ 2 \9) is replaced by a suitably truncated version of the series in (21). This does 
not make much difference, because the terms in the sum (21) decrease exponentially fast. 

4. Asymptotics of the pseudo-likelihood estimator 

In this section, we present the asymptotic properties of estimators obtained by solving the 
estimating equation G„(9 n ) = 0, where G n is given by (16). Important particular cases 
of this are the maximum pseudo-likelihood estimator obtained by maximizing (6) and 
the optimal prediction-based estimator obtained by solving G* (6 n ) = with G* given 
by (24). The depth, k, of G n is assumed fixed. The asymptotic properties are proven for 
a solution to the general equation (1) under the following assumption: 

Condition 4-1- 

(a) The functions Kg(t) and A(9) are twice continuously differ entiable with respect 
to 9. 

(b) The p x (fc + 1) matrix (de^iQ) dev k (9)) has rank p (in particular, k + 1 > p). 

(c) A(6)K$ k (9 )-$ k (9))=0 if and only if 9 = Q . 

Here 



(25) 
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and 




sponding to the optimal prediction-based estimating function), then Condition 4.1(a) is 
satisfied if Kg{t) is three times continuously diffcrcntiable, which is the case for the mod- 
els considered in Examples 2.1 and 2.2. Condition 4.1(a) ensures that the functions (f>k{9) 
and Vk(0) are continuously diffcrcntiable. Condition 4.1(c) is an identifiability condition 
that ensures eventual uniqueness of the estimator. 

Theorem 4.2. Assume that the true parameter value 9$ belongs to the interior of the 
parameter space 0. Suppose that Condition 4-1 is satisfied, and that the matrix A(9q) has 
full rank p. Then a consistent estimator 9 n that solves the estimating equation G n (9 n ) = 
exists and is unique in any compact subset of containing 9$ with a probability tending 
to 1 as n — V oo. Moreover, 



Here S(9) is the sensitivity matrix given by (18). 

Note that it follows from (19) and (20) that A*(9 ) and A(9 ) have rank p if S(9 a ) 
has rank p, because M n and M^ 1 ' are non-singular covariancc matrices. That S(9q) has 
rank p follows from Condition 4.1(b) by (27) below. 

Proof of Theorem 4.2. The theorem follows from general asymptotic statistical results 
for stochastic processes (see, e.g., Jacod and S0rensen [13]). We need to establish that a 
law of large numbers and a central limit theorem hold and to check regularity conditions. 

Under our general assumption that X is stationary, Reifi [21] (page 25) showed that 
X is exponentially /3-mixing. Therefore, a law of large numbers holds for sums of the 
form n~ l J2j=i f (■^■i+k-.i) ■ The process {Hi(9o)} is exponentially a-mixing, and because 
the process X is Gaussian, Hi(9) has moments of all orders. Therefore, it follows from 
Theorem 1 in Section 1.5 of Doukhan [5] that (21) converges, and that 



Next, we need to check regularity conditions that ensure the asymptotic results. 
The estimating function satisfies that Eg (G n (9a)) = 0. Furthermore, it is obvious from 
the continuous differentiability of the functions 4>k(9) and Vk{9) that the derivatives 
dg j A(9)Hi(9) + A(9) dg j H i (9) are locally dominated integrable. Finally, the matrix U(9q) 



V^{6„ - ) ^ N p (0, U(9o)- 1 V(9 )(U(9 )- 1 ) T ) 
as n->oo, where V(9 ) = A(9 )M(9 )A(9 ) T with M(6 ) given by (23), and 
U(6 ) = Eg (dgG n (9 Q ) T )/(n - k) = S(9 )A(9 ) T . 



(26) 




as n — y oo. 
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is invertible because A(9o) has full rank and 

S(6 )^-(d e cl > l(9o) d e v k (6 ))iC, (27) 

with K, given by (25). The first matrix has full rank by Condition 4.1(b), and K, 
is invertible because JC k (9o) is the covariance matrix of X(A), . . . ,X(kA), which is 
not degenerate. Now the existence and consistency of 9 n , as well as the eventual 
uniqueness of a consistent estimator on any compact subset of 6 containing 9q, fol- 
low (see Jacod and S0rensen [13]). The locally dominated integrability of A(9)Hi(9) 
(which follows from Condition 4.1(a)) implies that n~ l G n {9) converges uniformly to 
A{9)E 6q {H 1 {9)) = A(9)K(4> k (9 ) - 4>k{6)) for 9 in a compact set. The fact that the limit 
is a continuous functions of 9 and satisfies A{9)Eg {Hi(9)) ^ for 9 ^ 9q implies that 
any non-consistent solution to the estimating equation will eventually leave any compact 
subset of containing 9q. The asymptotic normality follows by standard arguments (see, 
e.g., Jacod and S0rensen [13]). □ 

A simpler estimator with the same asymptotic distribution as in the estimator from 
(16) is obtained from the estimating function 

n-1 

G° n (9) = A(6° n )Y,H-i(6), 

i—k 

where 0° is some consistent estimator of 9, obtained, for instance, by simply using p 
suitably chosen coordinates of Hi(9). For this estimating function, the identifiability 
condition Condition 4.1(c) can be replaced by the following condition: 

Condition 4-3. 

(a) The function (<f> k (9),v k (0)) is one-to-one. 

(b) 4>k{&o) ~ 4>k{9) € N 1 - for all 9 £ O, where N is the null space of the matrix A(9q)K.. 

This readily follows from the fact that the limit of (9) is A{6 )K{4> k {6 ) - (j> k (8))- 

In the case of the pseudo-likelihood function, we have the simple expression 

A{e )K = {v k {6 )- 1 d e( l>l{e a )K k {6 ) \v k {9 )- 2 dgv k (9 )). 

Condition 4.3(a) is a basic assumption without which there is no hope of estimating 
9 using the pseudo-score function (15). The condition must be checked for individual 
models. Obviously, it is not always satisfied, as demonstrated by the following examples. 
Consider the model in Example 2.1 with the restriction that b = a. For this model, the 
autocovariance function depends on r and b only through r — b^ 1 > for t G [0, r]. In 
Example 2.2 with the restriction that a — 0, the autocovariance function depends on r 
and b only through r\fb for t £ [0,r]. 

Theorem 4.2 implies that the asymptotic distribution of the optimal prediction-based 
estimator, is 

Vn~(9* n - 9 ) A N p (0, (S(9 Q )M(9 Q )- 1 S(9 a ) T )- 1 ), (28) 
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and the asymptotic distribution of the pseudo-likelihood estimator, 9 ni is 

\fc{6 n - Oo) A N p (0, W(eo)- 1 + W(e Q )- 1 B(9o)W(9 )- 1 ), (29) 

where 

W(9) = S(9)M^(0)- 1 S(6f = deMO) T IC k (9)do T MO) + dev^d^B) 

Vk(9) 2v k (9y 

and 

B{9) ^A(9)M {2 \9)A(9) T = S(9)M (1 \9)- 1 M (2) (9)M (1) (9)- 1 S{9) T . 
The result for 9* follows because 

-S(9 a )A*(9 Q f = A*(9 )M(9 )A*(9 ) T = S(9 )M(9r 1 S(9 Q ) T , 
and the result for 9 n follows because 

-S(9 Q )A(9 ) T = S(9 Q )M^(9 )- 1 S(9 ) T 

and 

A(9 )M(9 )A(9 f = S(9 Q )MW(9 a )- 1 S(9 Q ) T + A(9 Q )M^(9 )A(9 Q ) T . 

According to the general theory of estimating functions (see, e.g., Heyde [11]), the 
matrix S(9 )M(9 )- 1 S(9 ) T - (W^a)- 1 + W(9 )~ 1 B(9 )W(9 )' 1 )- 1 is positive definite; 
that is, the asymptotic covariance matrix of 9 n is larger than that of #* (in the usual 
ordering of positive semi-definite matrices). Thus the asymptotic variance of f{9 n ) is 
larger than that of f{9* l ) for any differcntiable function / : M. p i— > R. If B(9$) is invertible, 
then 

[W^o)- 1 + W(9 )- 1 B(e )W(9 )- 1 ]- 1 = W(9 ) - [BiOo)- 1 + W^o) -1 ] -1 , 
and if M^(9 ) is invertible, then 

M(eo)- 1 =M( 1 )(&o)- 1 -MW(eo)- 1 [MW(eo)" 1 + M( 2 )(^o)" 1 ]" 1 MW(^o)" 1 , 

where we have used twice that (I + A)^ 1 = I — A(I + A)^ 1 for a matrix A. Thus the 
difference between the two inverse asymptotic covariance matrices can be expressed as 

S(9 Q )M(9o)- 1 S(9of - [W(9 Q )- 1 + W(9 Q )- 1 B(9 Q )W(9 Q )- 1 }- 1 
= [B(9 Q )- 1 + W(9 Q )- 1 ]- 1 

- S(9 Q )M^(9 )- 1 [M^(9o)- 1 + M^(9 r 1 ]- 1 M {1 \9 )- 1 S(9 ) T (30) 
= [(A(9 Q )M^(9 )A(9 ) T y 1 + (A(9 a )M^(9 Q )A(9 ) T )- 1 ]- 1 

- A(9 Q )[M^(9 )- 1 + M^(9 Q )- 1 ]- 1 M0o) T . 
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It is considerably easier to calculate the pseudo- likelihood function (6) than the optimal 
estimating function (24), because the latter involves derivatives with respect to 8 of the 
covariance function and higher-order moments of X. In particular, in cases where the 
covariance function is not explicitly known and must be determined by simulation, it is 
much easier to calculate (6) than (24). Thus the maximum pseudo- likelihood estimator 
is preferred in practice. The formula (30) then can be used to assess whether the loss of 
efficiency relative to the optimal estimator is acceptable. 

Example 4-4- As an example, we calculated the efficiency loss for the model (7) in Ex- 
ample 2.1 in a number of cases. When k is sufficiently large, the pseudo- likelihood function 
is almost efficient, and thus the information loss (30) is necessarily small. Therefore, it 
is most interesting to calculate the efficiency loss when k is small. We calculate the rel- 
ative information loss, that is, the information loss (30) relative to the information for 
the optimal estimator given by (28). The main problem is to calculate the matrix (21). 
However, for k = 1, a simple expression for each term in the sum (21) can be obtained 
using the formula of Isserlis [12], and so a suitably truncated version of the sum can 
be easily calculated. For k > 2, the matrix (21) can be determined by simulation using 
(23). Specifically, we determine the covariance matrix M n (6o), with n suitably large, by 
simulation. This is computationally more demanding. 

We first considered the efficiency loss for the parameter b in the case k = 1. The 
parameters a 2 and r were fixed at a value of 1 , and a was chosen equal to — 1 . For b = 
— e~ 2 = —0.1353 (the value for which the mixing rate is maximal), the relative information 
loss was found to be very small, less than 0.1 percent for A = 0.1, A = 0.5, and A = 1. 

Next, we calculated the information loss for a number of values of b with A = 1. 
For b= —0.3,-0.06,0.05,0.1,0.2,0.3, and 0.5, the mixing rate is relatively high, and the 
information loss is less than 0.1 percent. For b = —0.5, —0.4, 0.7, and 0.9, the information 
loss is between 0.1 and 1 percent, whereas for b = —0.6, —0.7, and —0.9, it is 1.4 percent, 
3.0 percent, and 9.8 percent, respectively. 

Finally, we calculated the relative the information loss for k = 3 and k = 5. In this 
case, information loss was calculated for both a and b. The parameters a, a 2 , and r had 
the same value as before, and A = 1. For b = — e -2 , the relative information loss for both 
a and b is less than 0.1 percent for both values of k. For b = —0.5 the information loss is 
less than 0.1 percent for b and 0.2 percent for a. 

In most cases, the relative information loss is so tiny that in practice it is preferable 
to use the maximum pseudo-likelihood estimator. The information loss increases as the 
mixing rate decreases. Only for k — 1 and b — —0.9 is the information loss large enough 
to justify the use of the more complicated optimal estimator. 

5. Simulation study 

In this section we report the results of a simulation study in which we investigated 
some properties of the pseudo-likelihood estimator introduced in Section 2. We restrict 
ourselves to the model considered in Example 2.1 and to estimating 9 = (a, b). The 
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delay time r is chosen equal to 1, and a 1 is fixed at 1. This study was not intended 
to serve as a complete simulation study; rather, the intention was to illustrate some 
properties of the estimator and give a first impression of how the joint distribution of 
the two-dimensional estimator 9 n = (a n ,b n ) depends on the time between observations 
A, the depth k of the pseudo-likelihood function, and the true parameter value. We 
performed simulations for three values of 9: 9 = ( — 1,0.95) near the upper boundary of 
the domain of stationarity, 9 = (—1, — 1/c 2 ) = ( — 1, —0.1353) which is the parameter value 
with the highest possible mixing rate for the stationary solution X when a = — 1, and 
9 = ( — 1, —2.1) near the lower boundary of the domain of stationarity. For each parameter 
value, we considered four sampling frequencies with the same number of observation 
time points (200); specifically, the observation time points were iA, i = 1, . . . , 200, with 
A = 0.05, 0.1, 0.5, 1. The simulations of the SDDE were conducted with a step size of 
0.001. In all cases, 1000 data sets were simulated, and thus 1000 estimates were generated. 
For each data set, a new trajectory of the driving Wiener process was generated. The 
full simulation study is reported in Kuchler and S0rensen [18]. 

Table 1 reports the mean values and standard deviations of the simulated estimates 
of a and b for (a,b) = (—1,-0.1353). For (a, b) = (—1,0.95), the estimators are more 
biased and have a larger standard deviation for small values of A and k, whereas for 
[a, b) = (—1, —2.1), the bias is small and the standard deviations are comparable in all 
cases. For (a, b) = (—1, 0.95), the estimators of a and b are highly correlated, whereas this 
is the case only for small values of A and k for the other parameter values. 

The most remarkable observations from our simulation study can be summarized as 
follows: 

• For a fixed number of observation time points, the bias and standard deviation of 
the estimators worsen as the time between observations A decreases, at least when 
A < r. For A > r, the quality does not change much with A, and whether the bias 
and variance increase or decrease with A depends on the parameter value. 

• The smaller the A value, the more the choice of the depth k of the pseudo-likelihood 
functions influences the quality of the estimators when A < r. For A > r, the im- 
portance of k increases again for some parameter values. 

• It is surprising that a similar pattern is seen when the length of the observation 
interval nA is fixed so that the sample size decreases as A increases. However, here 
there is a clearer tendency for the estimators to deteriorate when A > r, so that 
there is an optimal value of A, which seems to be around r. 

• The absolute value of the correlation between a and b decreases with increasing 
depth k to a limit, which is strongly dependent on the true parameter value. Near 
the upper boundary of the stability region, the estimators are highly correlated. 
A high absolute value of the correlation indicates that it is difficult to distinguish 
between the effects of the lagged term and the nonlagged term in the drift; thus, it 
is not surprising that the absolute correlation is large when the depth is small. 

• For small values of the depth k, the joint distribution of the estimators of a and b 
can deviate from a two-dimensional normal distribution by having crescent-shaped 
contours. 
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Table 1. Mean and standard deviation of the pseudo-likelihood estimator of a (upper part of the 
table) and b (lower part of the table) for various values of depth k, time between observations A, 
and number of observations n. In all cases nA, the length of the observation interval, is 200, 
and the true parameter values are a = — 1 and b = —0.1353 



k 



A 


n 


1 


3 


5 


7 


9 


13 




20 


0.05 


4000 


— 1.73 


—1.07 


— 1.04 


— 1.02 


— 1.02 


_p 


.01 


— 1.01 






2.32 


0.21 


0.14 


0.11 


0.11 





10 


0.09 


0.1 


2000 


— 1.27 


— 1.03 


— 1.03 


—1.01 


-1.01 


-1 


01 


—1.01 






0.83 


0.13 


0.10 


0.09 


0.09 





09 


0.09 


0.5 


400 


-1.04 


-1.01 


-1.01 


-1.01 


-1.01 


-1 


01 


-1.01 






0.14 


0.10 


0.09 


0.09 


0.10 





09 


0.10 


1.0 


200 


-1.02 


-1.01 


-1.01 


-1.02 


-1.02 


-1 


01 


-1.02 






0.12 


0.11 


0.11 


0.11 


0.11 





11 


0.12 


2.0 


100 


-1.10 


-1.04 


-1.03 


-1.03 


-1.04 


-1 


04 


-1.04 






0.43 


0.21 


0.19 


0.18 


0.21 





19 


0.17 


0.05 


4000 


0.42 


-0.09 


-0.11 


-0.15 


-0.13 


-0 


14 


-0.14 






2.66 


0.44 


0.31 


0.23 


0.19 





14 


0.09 


0.1 


2000 


0.08 


-0.13 


-0.14 


-0.14 


-0.14 


-0. 


13 


-0.13 






1.14 


0.27 


0.18 


0.13 


0.11 





09 


0.09 


0.5 


400 


-0.12 


-0.14 


-0.14 


-0.13 


-0.14 


-0 


14 


-0.14 






0.28 


0.10 


0.11 


0.11 


0.11 





11 


0.10 


1.0 


200 


-0.14 


-0.14 


-0.14 


-0.13 


-0.14 


-0. 


13 


-0.13 






0.16 


0.13 


0.13 


0.13 


0.13 





13 


0.14 


2.0 


100 


-0.26 


-0.15 


-0.14 


-0.13 


-0.15 


-0 


14 


-0.14 






0.65 


0.29 


0.27 


0.25 


0.30 





27 


0.24 
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